The Enhancement of Semijoin Strategies in Distributed Query Optimization
نویسندگان
چکیده
We investigate the problem of optimizing distributed queries by using semijoins in order to minimize the amount of data communication between sites. The problem is reduced to that of finding an optimal semijoin sequence that locally fully reduces the relations referenced in a general query graph before processing the join operations. The optimization of general queries, in a distributed database system, is an important and challenging research issue. The problem is to determine a sequence of database operations which process the query while minimizing some predetermined cost function. Join is a frequently used database operation. It is also the most expensive, specifically in a distributed database system; it may involve large communication costs when the relations are located at different sites. Hence, instead of performing joins in one step, semijoins [1], are performed first to reduce the size of the relations so as to minimize the data transmission cost for processing queries [2]. In the next step, joins are performed on the reduced relations. as follows: (i) project R on the join attribute A (i.e. R(A)); (ii) Ship R(A) to the site containing S; (iii) Join S with R(A). The transmission cost of sending S to the site containing R for the join R ~n S can thus be reduced. There are two main methods to process a join operation between two relations. One is called the nondistributed join, where a join is performed between two unfragmented relations. The other is called the distributed join, where the join operation is performed between the fragments of relations. As pointed out in [5], the problem of query processing has been proved to be NP-hard. This fact justifies the necessity of resorting to heuristics. The remaining of this paper is organized as follows: preliminaries are given in Section 2. Section 3 defines the main characteristics of two semijoin-based query optimization heuristics; then, we present and discuss the join query optimization in a fragmented database. Finally, Section 5 concludes the paper.
منابع مشابه
Investigating the 2-way Semijoin for Distributed Query Optimization
With increased globalization, most databases are now highly distributed. Thus, there is a need to reduce the cost of queries that require data from several locations. In the literature, the semijoin is the most commonly used operation for the reduction phase of any distributed query optimization strategy. In [7] an improvement, called the 2-way semijoin, is proposed. The authors conclude that t...
متن کاملA New Client-server Architecture for Distributed Query Processing a New Client-server Architecture for Distributed Query Processing
This paper presents the idea of \tuple bit-vectors" for distributed query processing. Using tuple bit-vectors, a new two-way semijoin operator called 2SJ++ that enhances the semijoin with an essentially \free" backward reduction capability is proposed. We explore in detail the beneets and costs of 2SJ++ compared with other semijoin variants, and its eeect on distributed query processing perform...
متن کاملAn Intelligent Search Method for Query Optimization by Semijoins
Query optimization strategies based on the reduction of the referenced relations by means of semijoins have received considerable attention. The limitations of such strategies have to do with computational e5ciency (very large search space of semijoin reduction sequences), optimality of the solution (when heuristics are used), and generality of the class of queries allowed (e.g., simple queries...
متن کاملBetter Semijoins Using Tuple Bit-vectors Better Semijoins Using Tuple Bit-vectors
This paper presents the idea of \tuple bit-vectors" for distributed query processing. Using tuple bit-vectors, a new two-way semijoin operator called 2SJ++ that enhances the semijoin with an essentially \free" backward reduction capability is proposed. We explore in detail the beneets and costs of 2SJ++ compared with other semijoin variants, and its eeect on distributed query processing perform...
متن کاملOptimizing Entity Join Queries by Extended Semijoins in a Wide Area Multidatabase Environment
In this paper we consider processing entity join queries in a wide area multidatabase environmen t where the query processing cost is dominated by the cost of data transmission An entity join oper ation integrates tuples representing the same en tities from di erent relations in which inconsistent data may exist The semijoin technique has been successfully used in a distributed database system ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998